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Abstract—The progress on constructing quantum computers and the ongoing standardization of post-quantum cryptography (PQC) 
have led to the development and refinement of promising new digital signature schemes and key encapsulation mechanisms (KEM). 
Especially lattice-based schemes have gained some popularity in the research community, presumably due to acceptable key, 
ciphertext, and signature sizes as well as good performance results and cryptographic strength. However, in some practical 
applications like smart cards, it is also crucial to secure cryptographic implementations against side-channel and fault attacks. In this 
work, we analyze the so-called redundant number representation (RNR) that can be used to counter side-channel attacks. We show 
how to avoid security issues with the RNR due to unexpected de-randomization and we apply it to the Kyber KEM and show that the 
RNR has a very low overhead. We then verify the RNR methodology by practical experiments, using the non-specific t-test 
methodology and the ChipWhisperer platform. Furthermore, we present a novel countermeasure against fault attacks based on the 
Chinese remainder theorem (CRT). On an ARM Cortex-M4, our implementation of the RNR and fault countermeasure offers better 
performance than masking and redundant calculation. Our methods thus have the potential to expand the toolbox of a defender 
implementing lattice-based cryptography with protection against two common physical attacks. 


Index Terms—Lattice-Based Cryptography, Module-LWE, Kyber, Side-Channel Attacks, ARM Cortex-M4 


1 INTRODUCTION 


HE importance of post-quantum cryptography (PQC) 
has significantly grown over the past couple of years 
with Shor’s polynomial-time algorithm for prime factor- 
ization |1], advances in the construction of quantum com- 
puters |2|, and the ongoing NIST PQC standardization 
process |3|. Some promising families of cryptosystems cur- 
rently in round 3 of the NIST process |4| are based on the 
hardness of certain lattice problems. An advantage of lattice- 
based cryptography is that it allows constructing public-key 
encryption (PKE) and digital signatures at the same time 
with certain similarities in their structure. In addition, im- 
plementations of lattice-based schemes in the NIST process 
have proven to be quite efficient on embedded devices [5], 
with reasonable public key and ciphertext or signature sizes. 
Even though reference implementations already claim to 
have constant or secret independent timing behavior, this 
is not sufficient in a setting where an attacker may gain 
full control over a device, e.g., in smart cards for payment, 
digital identification documents, or digital signatures. In 
such scenarios, there is a need for appropriate side-channel 
and fault attack protection to prevent long-term secrets from 
getting compromised [6]. Thus, the development of efficient 
countermeasures against physical attacks, which take into 
account the constraints of embedded devices, is a prereq- 
uisite before PQC can be deployed in the aforementioned 

use-cases. 
A good overview of already existing implementations of 
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PQC is given in and a good summary of attacks and 
countermeasures can be found in [8]. Recent works have 
focused on the side-channel protection of components or 
of the whole decryption operation of lattice-based schemes 
that are based on the ring learning with errors (RLWE), ring 
learning with rounding (RLWR) or the modular versions 
of these assumptions (MLWE/MLWR). The implementation 
strategy on microcontrollers and reconfigurable hardware is 
usually to employ arithmetic masking to protect arithmetic 
operations in the polynomial ring Rg = Z,[|X]/(X” + 1), 
a masked decoder, sampling of noise polynomials using a 
masked sampler [9], and a masked comparison [10]. 

However, an interesting question is whether alternatives 
to straightforward arithmetic masking can be developed to 
appropriately protect computations in Ry and how such 
alternatives perform in comparison to masking in terms 
of security and efficiency. Some implementations that use 
masking are still susceptible to single-trace attacks under 
certain conditions [i], and would require further pro- 
tection. Such new techniques could then either be combined 
with masking to increase the security level or used as an 
alternative due to better performance. 

An approach that may complement masking is poly- 
nomial blinding in which two polynomials f,g € R4 
are multiplied by a random integer a € Z, such that 
(af) - (a~'g)) f - g. In addition, Zijlstra, Bigou, and 
Tisserand proposed the usage of the redundant number 
representation (RNR). The core idea is that one randomizes 
coefficients c € Z, in Rg by adding a random value r -q 
where r € [0, 2”) for some integer k. Computations are then 
carried out mod (2*q) and the final result is obtained by 
reducing mod q. The FPGA implementation of the counter- 
measure described in showed reasonable overhead and 
promising security properties in simulations. However, the 
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redundant representation method may also be appealing on 
a microcontroller where coefficients of typical lattice-based 
KEMs are in the range of 12 bits (for Kyber in round 
2/3), 13 bits (for Saber) to 14 bits (for Newhope) and thus 
do not fill a 16 or 32-bit register. 

Besides side-channel attacks, it is also important to also 
consider fault attacks as an attacker might always choose the 
path of least resistance or combine semi-invasive and non- 
invasive attacks. In contrast to a large number of attacks, an 
implementer focuses on a smaller number of effective tools 
and concepts available to counter groups of such attacks. For 
example, in the authors propose noise samplers with 
built-in fault protection. A generic method against simple 
fault attacks is double computation and a comparison of 
both results. However, when combining masking with dou- 
ble computations the performance overhead may become 
too high. 

As a consequence, it currently seems to be an open 
question, how to efficiently combine fault and side-channel 
countermeasures for arithmetic operations in lattice-based 
schemes, while achieving low computational overhead. 
Moreover, a suitable fault countermeasure should indicate 
arithmetic faults (e.g., a bit flip in a coefficient of a polyno- 
mial in R4) but also faults in the control flow. 

Contribution. In this work, we provide methods for the 
protection of arithmetic operations in lattice-based cryptog- 
raphy against side-channel and fault attacks. We evaluate 
the effectiveness of the redundant number representation 
(RNR) from and show that a naive implementation 
of the RNR is unsafe. The issue is the cancellation of ran- 
domization due to constant values or adversarially chosen 
inputs. We then implement the updated RNR approach on 
an ARM Cortex-M4 microcontroller and we show how it 
can be applied to fast software implementations of Kyber 
using a state-of-the-art 32-bit NTT. Our RNR-protected NTT 
implementation achieves a low overhead with only 7737 
cycles compared to 6829 cycles for an unprotected NTT. 
In addition, we perform practical side-channel evaluations 
using the t-test methodology to verify the approach. Fur- 
thermore, we present a novel method to detect fault-attacks 
in lattice-based cryptography using the Chinese Remainder 
Theorem (CRT) and theoretically evaluate it for common 
parameter sets of lattice-based cryptography and against 
different realistic fault models. For the first time, we then 
show how the fault countermeasure can be combined with 
the RNR for the linear parts of Kyber decryption, which is 
a core part of the Kyber decapsulation to counter both side- 
channel and fault attacks. This is non-trivial as a careful 
parameter selection for RNR and CRT is required to still be 
able to use the NTT for fast computation. We achieve lower 
overhead than masking and redundant calculation. 


2 PRELIMINARIES 


In this section, we introduce the notation for lattice-based 
cryptography and the basics of the Kyber scheme. 


2.1 Notation 


For x € R, we define [x] = |x + 4] € Z. Let Z, denote 
the quotient ring Z/qZ for an integer q > 1. Thus, Z, is 
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the ring of cosets x + qZ with addition and multiplication 
operations. For a,b € Z, we write a mod‘? 6 for the unique 
integer â = a mod b such that 0 < â < b. Let R=Z[X]/(f), 
where f is usually f = X” + 1 for n being a power of 2, 
and R = R/(q4) = Z4[X]/(f) for some positive integer q. 
Any element a € R, as well as vectors of these elements 
are denoted as bold lower case letter. We use the notation 
ali] for i = 0,...,n — 1 to access the i-th coefficient of a. 
Matrices of elements in R4 are denoted as bold upper case 
letters. For a given set S and a probability distribution D 
over S, we use s + D to mean s € S sampled according 


to D using coins r. In addition, we use s & S to mean 
s € S sampled uniformly at random from S. Hereby, U (q) 
denotes the uniform distribution on R,, whereas x denotes 
an error distribution to be defined for the specific algorithm. 
When covering our implementation, we define the x mod q 
operation for integers x,q to always produce an output in 
the range [0, q — 1]. Unless stated otherwise, when we access 
an element afi] of a polynomial a € R,, we always assume 
that afi] is reduced modulo q and in the range [0, q — 1]. 


2.2 The NTT 


Designers of lattice-based schemes can use the number- 
theoretic transform (NTT) to reduce the computational cost 
of polynomial multiplications (see [17}). Exemplary, the 
NIST round 3 finalist Kyber has an NTT-friendly param- 
eter set and incorporates the NTT into the definition of the 
scheme. The NTT enables efficient and simple to implement 
polynomial multiplication for suitably parameterized rings 
Rq. The product of two polynomials a, b € Ry can be com- 
puted as a- b = INTT(NTT(a)oNTT(b)), where o denotes 
coefficient-wise multiplication. However, a straightforward 
application of the NTT to R, and avoidance of any zero 
padding requires the existence of a 2n-th root of unity y in 
Ry, which is the square root of the n-th root of unity w. This 
holds for Rg when n is a power of 2 and q is a prime such 
that g=1 mod 2n. 

For a polynomial g = Yg g[i|X’ € R, we 
define NIT(g) = § = izd S[i]X* with gli) = 
Erci yİg|jjw mod q where w is an n-th primitive root 
of unity and y = yw mod q. The inverse function INTT 
is defined as INTT() = g = Cg giX’ with g[i] = 
(ty he, lilu) mod q is 


2.3 Kyber 


The KEM Kyber is currently in the third round of the NIST 
PQC standardization process (15), [19]. Kyber’s security is 
based on the hardness of solving the learning-with-errors 
(LWE) problem in module lattices (see [20}). To achieve 
semantic security with respect to an adaptive chosen ci- 
phertext attack (CCA), Kyber internally uses an IND-CPA 
secured public-key encryption (PKE) scheme and applies 
a variation of the Fujisaki-Okamoto (FO) transform (21). 
Kyber instantiates the ring Ry with the polynomial ring 
Za[X]/(X” + 1) with n = 256 and scales its security by 
using the module structure. 


2.3.1 Changes to the Kyber NTT in Round 2 


In this work, we always refer to the third round version 
of Kyber. However, we would like to highlight a tweak 
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introduced in round 2 of the NIST process that has an impact 
on the exact realization of the NTT. 

Kyber updated the definition of the NIT to allow the 
choice of a smaller modulus q = 3329 (n = 256 stays the 
same). This leads to smaller ciphertexts and public keys and 
enables the usage of a smaller noise distribution, which in 
turn reduces the amount of required pseudorandom bits. 


Definition 1 (Number-Theoretic Transform of Kyber). Let Z, 
be a finite field and ¢ be a primitive n-th root of unity 
in Z,. Then the NTT of an element x € R, computes an 
element y € R, via the map 


n/2-1 
ye = S xtayleearore, 
j=0 
n/2—1 
yl2k+ 1] = J xp + nO, 
j=0 


where br7(i) denotes the bitreversed number of a seven- 
bit integer and k € {0,...,n/2 — 1}. The inverse NTT 
(INTT) is given by inverting both parts of the NTT 
individually as in Section[2.2| 


Let ¢ = 17 be the first primitive 256th root of unity in the 
case of Kyber. Then the equality 


127 127 
KX 256 +1= es = CA) = tes = Chery) (1) 
i=0 i=0 
from holds. The Chinese remainder theorem (CRT) 
provides an isomorphism such that the polynomial multi- 
plication of degree 256 can be performed in 128 polynomial 
multiplications of degree two. If not explicitly stated other- 
wise, all usages of the NTT from now on refer to the updated 
definition of the NTT for Kyber introduced in round 2 and 
kept in round 3. 


2.3.2 Simplified Kyber 


For later reference, we provide a simplified version of 
the decryption of the public-key encryption scheme Ky- 
BER.CPA in Algorithm |2|in Appendix |B| where d, and d, 
are positive integers, q = 3329 and u and v are polynomials 
in R, of degree n = 256. Define the functions 


COMPRESS, (x, d) := [(2%/q) - x] mod H 24, 
DECOMPRESS, (x, d) := [(q/2%) - a]. 


componentwise on each coefficient of a polynomial. When 
we apply the NTT to a vector of polynomials, the NTT gets 
applied to each polynomial individually. 


3 REDUNDANT NUMBER REPRESENTATION 


In this section, we analyze the applicability of the redundant 
number representation to lattice-based schemes and show 
unexpected pitfalls and security issues for some parameter 
choices. In their work, Zijlstra, Bigou, and Tisserand 
propose to randomize a coefficient c € Z, by adding r -q 
to it, where r € [0,2*) is a random number for some 
integer k and different for each coefficient. The integer k 
denotes the number of bits of randomness. Then, all arith- 
metic operations are performed in Zx,. Therefore, in each 
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execution, all adders, multipliers, and decoders are handling 
randomized inputs. The redundant number representation 
countermeasure is then analyzed by a simulated correlation 
power analysis (CPA) attack for different redundancy levels, 
i.e. values of k = 0 to k = 8. Moreover, it is shown that the 
overhead in an FPGA implementation is low. 

In this work, we use q’ to denote the multiple of the 
modulus q, which is q = 2* in ja. Additionally, for a 
more detailed analysis, we increase the range of possible 
randomization values by allowing values k > 8 that are not 
a power of two. Thus, when randomization is applied, all 
operations are performed mod q'q and constants or inter- 
mediate values are randomized by adding rq with random 
r € [0,q’). Note that each coefficient is randomized with a 
different value. Finally, to remove the randomization all val- 
ues are reduced modulo q. To obtain a fast and side-channel 
secured Kyber decryption (KYBER.CPA.DEC), the RNR can 
be applied during the computation of INTT(SoNTT(u)). In 
addition, it can be used to protect the linear parts of the 
Kyber re-encryption (KYBER.CPA.ENC). However, two is- 
sues arise in a typical Kyber microcontroller implementation 
where the RNR should be used to protect also the NTT. 
One is related to the incompatibility of a power of two 
q’ with typical NTT implementations and the other with 
derandomization caused by NTT constants or an attacker. 


3.1 


An important technique for efficient lattice-based cryptog- 
raphy in finite rings with prime modulus is the usage of 
Montgomery reduction and representation for fast mod- 
ular reduction [22]. The technique was first applied to a 
fast software implementation of NewHope where all 
precomputed constants used for the NTT are stored in 
Montgomery representation and is widely used in other 
implementations as well. However, this technique requires a 
Montgomery constant M of the form M = 2*7 for an integer 
z. In addition, this constant has to be relatively prime to 
the modulus of the finite ring. Hence, the usage of q' = 2" 
for Raq is excluded. A straightforward solution is to use 
an odd integer q’ for the ring extension method to make 
Montgomery reduction work efficiently with the RNR. 


Enabling of Fast Montgomery Reduction 


3.2 The NTT and (Adversarial) Derandomization 


To avoid unforeseen pitfalls when using the RNR method, 

a theoretic analysis of the algebraic properties of the ring 

extension is required on top of the evaluation in 14]. 

Hereby, we focus on the case where q is a prime number. 

This parameter choice has been made by some NIST finalist 

schemes, e.g. Kyber (15), whereas others select q as a power 

of two. However, for these schemes, the RNR is not appli- 
cable, as for any q equal to a power of two integer the value 
rq is equal to a bit shift to the left of r and does not impact 
lower-order bits when added to a constant or intermediate 
value. 

We recall a basic algebraic theorem concerning finite 
rings. 

Theorem 1. Given a finite ring R, any element a € R is either 
a unit in the ring, e.g. there is an element b € R with 
ab = 1p, or a zero divisor, e.g. there exists an element 
b € R with b Æ Or and ab = Op. 
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Offline: Initial randomization of s and constants 


Decompression of inputs u, v 


Randomization of inputs u, v 


v — INTT(SoNTT(u)) 


Decode to message m 


Fig. 1: Resulting Control Flow of RNR for Kyber Decryption 


In the case of a prime modulus q, the ring R, is a finite field, 
e.g. Rg does not contain any zero divisors. By using the 
RNR, the commutative ring Rgq satisfies every property of 
a field except for the existence of inverse elements. As stated 
in Theorem [1] the existence of zero divisors in the ring Rg'q 
is given. Since the proposed method is based on arithmetic 
operations, this poses the question if the randomness can 
be removed by multiplying with a specific number. Indeed, 
during the NTT in the Kyber decryption, precomputed pow- 
ers of ¢ in Montgomery representation (see Section2.3.1} are 
multiplied with secret intermediate values. The powers of 
¢ are fixed and could eliminate randomness during every 
execution of the decryption. This may enable SPA or DPA 
attacks. For instance, for the choice of q' = 257, the multipli- 
cation with ¢"! - M mod q = 1799 = 7 - 257 (including the 
Montgomery constant M = 216) removes the randomness 
part during every decryption. For a coefficient c randomized 
by addition of rq for r € [0, gq’) it holds that 


t=(c+r-q)-(€¢"7-Mmodgq) mod qq 
=c-1799+r-(1799)q mod q'q 
=c-1799+r-(7-257)q mod q'q 
= c- 1799 + (r - 7)q'q mod q'q 
= c- 1799 mod q'q. 


To avoid processing any non-randomized values, the 
parameter choice q’ can be adjusted such that no constants 
contain a multiple of q’. 

Depending on the selection of q', an adversary could 
exploit the removal of randomness by crafting malicious 
ciphertexts. The choice of the crafted coefficients of u is 
limited to 2“ values. If any of these possible values contains 
q’ as a factor, the attacker can choose all coefficients of 
a = NTT(u) to match that value. The coefficient-wise 
multiplication of the secret § by such a û would then 
trigger the removal of the randomization of $. Hence, secret 
information may then be processed non-randomized. The 
exposure to this relatively simply to perform CCA attack 
depends on q’. Due to the compression in the CCA2-secure 
schemes and, as a result, the limited choice of input values 
for u, the possibility of such an attack can be calculated and, 
therefore, mitigated given a specific q’. Another option to 
counter this attack, is the additional randomization of the 
inputs or the constant values (e.g. the powers of ¢). We show 
the recommended control flow for Kyber in|Figure 1| 
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3.3 Validation in Hamming Weight Leakage Model 


We now verify the effectiveness of the RNR in the commonly 
used Hamming weight leakage model [24], for different 
choices of q and q’. Hereby, we assume that an adversary can 
observe a leakage L(x) = HW (x) +M, where N denotes an 
additive noise with mean u = 0 and HW() denotes the 
Hamming weight of x. We simulate an attack for an adver- 
sary obtaining L(s’-c mod (qq’)) with zero noise, where a 
fixed secret s € Z, is protected as s’ = s+r-q mod (qq’) for 
a random r € Zg and a public and changing coefficient c. 
This resembles a DPA attack on the decapsulation operation 
in Kyber where a secret key s is multiplied coefficient- 
wise with ciphertext coefficients NTT(u) controlled by the 
attacker. We simulate N = 100 traces for a fixed secret 
s € Z, and varying coefficients c and randomness r. We 
then make use of the maximum-likelihood method that 
calculates the probability of occurrence for every possible 
value § based on the simulated leakage. This is an expensive 
yet very detailed brute-force approach widely used (271, 
[28]. We then choose the § that makes the observations 
most probable (commonly called the maximum likelihood 
estimate). If the chosen s is correct, i.e. § = s, the attack was 
a success. 

Firstly, we test the impact of the size of q’ in combination 
with q. In Figure|2| we plot the success rate using maximum- 
likelihood estimation (MLE) depending on the size of gq’ 
and for different values of q. For each tested gq’ we select 
1000 different secrets s’ and compute the success rate, i.e., 
cases where the MLE correctly predicted § = s. Note that 
the different values of q have a slightly different binary 
representation, including 


e q = 8192 = 100000000000002 (Saber) 
e q = 12289 = 110000000000012 (NewHope) 
e q = 38329 = 1101000000012 (Kyber) 


to capture the behavior of this method for different lattice- 
based schemes. As expected, the RNR does not work well 
with the power of two modulus of Saber. Moreover, the 
success rate for the Kyber modulus is smaller than for 
NewHope. Note that this experiment is in line with cor- 
relation power simulations in the Hamming weight leakage 
model already performed in [14]. However, they only tested 
for a power of two q' and a prime number q. 


Attacks on RNR with N=100 traces per attack 


100 + 


— Saber (q=2^13) 
— Kyber (q=3329) 
— NewHope (q=12289) 


805 


60 4 


Success Rate 


405 


204 


0 50 100 150 200 250 
Value of q' 
Fig. 2: Comparison of success rates using maximum- 
likelihood estimation on the Hamming weight of the result 
L(s’-c mod (qq')) for different lattice-based schemes. 
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As shown in Figure |2} we observed a different success 
rate for the RNR for the NewHope and Kyber modulus. 
Thus, we ran experiments for different moduli shown in 
Figure] From the experimental results we draw the follow- 
ing conclusions: 


e Randomization of lower order bits: In order to ran- 
domize any value x € [0,q) properly, the least 
significant bit (LSB) should be one. Similar to the 
case where q is a power of two, this prevents an 
early bit shift to the left. Hence, the least significant 
bits of the sensitive value are always affected by 
randomization. 

e Hamming weight of q: Not only a too sparse struc- 
ture of q but also too many bits equal to one can 
impede the effectiveness of this method as shown 
in Figure [3| This is because q = 111111111111g, for 
instance, can be written as q = 21° — 1. This reduces 
the number of affected bits by randomization as the 
addition of r - 2*3 does not have an impact on the 12 
lower order bits. 

e Structure of q in binary representation: As can be 
seen in Figure |3} not only the Hamming weight 


impacts the effectiveness of this method. Figure 


shows the results of the experiment for values of 
q with Hamming weight three. It might be part of 
future work to determine the exact reason for this 
behavior. 


As a consequence, values of q that are less effective in 
randomizing in the context of the RNR need a larger q' 
and thus larger randomness r € [0, q’) to increase entropy. 
As mentioned in Section the NTT itself consists of 
multiplications with fixed values. The multiplication of a 
randomized coefficient with a known value should also be 
randomly distributed. The pitfall of derandomization is not 
covered in the analysis of 14]. It can be captured by a 
slightly modified version of the maximum-likelihood attack 
for random but known coefficients c. In this version, we 
simulate a number of N traces of the multiplication with 
a specific power of ¢. Let s be the correct subkey guess. 
We can observe the Hamming weight HW 5». of the result 
of the multiplication (s + r - q) - ÇË using the Hamming 
weight leakage model. Then, for every possible subkey s the 
probability P(HW((§ + r - q) -¢’) = HWops) is calculated. 
We repeat this process for each trace and sum up the 
probabilities for each &. If the randomization has failed for 
the specific power of Ç and q’, the correct subkey guess is 
among the most likely ones, e.g. 


N 
max XO P(HW((8 + r + q) + GË) = HWors) = 


a 
JL PCW ((s +r; - q) + C) = HWoss). 


This is then a successful attack as the adversary can narrow 
down the subkey possibilities to all § with HW (ê - ¢’) = 
HW 0s. We show the success rates for 100 attacks on every 
multiplication with a power of ¢ used in the NTT of Kyber 
in Figure|4|for q = 3329, q' = 257 and N = 100 traces for 
each attack. The success rate of 100% for ¢’ = 1799 indicates 
an insecure parameter set. 
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3.4 Relation to Masking 


Recently, the masking approach for lattice-based cryptogra- 
phy has received increased research attention (e.g. for 
Kyber). Masking offers the possibility to prove correctness 
in the t-probing model. This means an adversary using t 
probes is not able to successfully obtain information about 
secrets in an implementation with t + 1 shares. This is inde- 
pendent of the number of traces that are captured. Masking 
can, therefore, be considered a powerful but computation- 
ally expensive tool because even linear functions have to 
be executed at least twice to counter side-channel attacks. 
The RNR approach, on the other hand, is more related to 
the blinding technique (e.g. presented in for ECC). The 
RNR is a heuristic method that cannot be proven in the t- 
probing model. An adversary that can probe randomized 
values can easily calculate the non-randomized values. The 
technique, therefore, increases security in practice by dras- 
tically increasing the number of traces an adversary has to 
capture to obtain secret information. In use-cases that only 
allow a limited number of decryptions, the RNR method 
may offer a better cost benefit ratio than masking. Also in 
combination with masking or other hiding countermeasures 
as shuffling, the RNR may be a powerful countermeasure, 
although this is out of scope for this work. 


4 FAULT PROTECTION WITH CHINESE REMAINDER 
THEOREM 


Several fault attacks on lattice-based schemes have already 
been proposed and only a few works on effective coun- 
termeasures exist. Common countermeasures are redun- 
dant calculations, checksums, or sensor-based countermea- 
sures such as light detection or supply voltage monitoring 
(see BT). As some of these techniques require a high 
computational overhead or changes to a device (e.g., light 
sensors), they are often not cost-effective. 

In this work, we propose a new technique for lattice- 
based cryptography. When applied properly, it can be used 
to detect fault attacks on coefficients in data structures 
representing polynomials in R4 in memory or during com- 
putations and also offers some protection of the control 
flow. Basically, in our countermeasure we use the Chinese 
remainder theorem (CRT) to combine elements of a lattice- 
based cryptosystem (e.g., secret keys, public elements) in a 
ring modulo q with constants in a ring modulo q’ (as q’ has 
a similar effect as in Section j} we use the same notation). A 
function g consisting of operations like polynomial addition, 
multiplication (including NTTs and coefficient-wise multi- 
plication) is then carried out over the combined ring modulo 
å = qq. After the sequence of computations is finished, the 
two rings are split again and the results obtained in the ring 
modulo q’ are compared to predetermined checkvalues. The 
checkvalue is obtained by computing g on the constants in 
the ring modulo q’. The intuition is, that faults introduced 
during the modulo ¢ computations will also disturb the 
final result obtained in the ring modulo q’. Our method is 
related to CRT codes [2], the Residue Number System used 
for ECC and Shamir’s Countermeasure for RSA and 
its analog method for elliptic curves in . However, we 
do not aim at error correction and work on fundamentally 
different data structures and constraints imposed by ring 
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Fig. 3: Success Rates for different Hamming weights (left) and success rates for different binary structures with the same 


Hamming weight (right). 
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Fig. 4: Success rates (correct subkey guess being among 
the most likely ones) of the maximum-likelihood attack on 
multiplication with a constant containing q’ as factor 


parameters. We would also like to note that an attacker 
might use any fault detection method, like the one provided, 
to carry out safe-error attacks (see for such attacks on 
ECC) or other attacks that exploit the presence of an error 
detection mechanism. A countermeasure that makes some 
safe-error attacks harder, is randomization of the processed 
data. A topic that we consider in Section [5]in combination 
with fault protection. 


4.1 Formal Description 


More formally, given the factor ring Rg = Z,[X]/(f) for a 
suitable f, e.g, f = X” + 1, we introduce two new factor 
rings Ry = Zy [X]/(f) and Rg = Z,[X]/(f) with q being 
relatively prime to q’ and q- q' = @. In this case the CRT 
yields an isomorphism 


Zal X\/(f) = Za [X]/(F) x Zql X1/(P) (2) 


in which coefficients r[j] of r € Rọ are associated with 
coefficients v[j] of input data v € R, for j = 1,...,n. 

We can achieve fault protection when carrying out a 
function gr : R! > R* with | ring elements from a ring R 
as inputs that produces k ring elements from R as output. 
As an example, the function tı = gr,(a,e,s) = a-e+s 
computes a sample from the RLWE distribution with | = 3 
inputs and one output (k = 1) in Ry. 
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Our countermeasure is initialized by fixing suitable val- 
ues for n,q,q', f that define the rings Ry, Ry, Rọ and by 
fixing a function g that then implies values of | and k. With- 
out our countermeasure implemented, the device would 
compute gr, on input values v; for i = 1,...,/ (e.g., de- 
cryption or encryption in Kyber). The first step is performed 
offline before the device is deployed. In this step constants 
Y1,- r} € Ry are sampled from a suitable distribution, like 
the uniform distribution or using a deterministic pseudo- 
random generator (PRG) based on a seed. The checkvalues 
t1,...,t, are computed as gpr, (r1, r1) = (t1,...,t%) and 
the constants (or seeds) and checkvalues are then stored in 
the device’s ROM or non-volatile memory (NVM). 

In the online phase, the attacker has access to the device. 
The function g is protected by combining coefficients of the 
v;’s component-wise with the respective coefficients of the 
r;/s as 


zili] = (rilj]- q- (q~* mod q’) 
+ vilj]-q'- (q7 mod q)) 


for i = 1,...,J and j = 1,...,n. Then, the function gr, 
is applied to the lifted coefficients which results in values 
(p1,---,Pkr) = g(Z1,-.--,Z1). The integrity of the results of 
the calculation can now be checked by simply reducing all 
p; mod q’ and comparing the result with the respective t;. 
Accordingly, the intended result of the function gr, can 
then be obtained by reducing all values p; modulo q as 
w = p; mod q. Of course, it is also possible to set g(v, $, u) = 
v—INTT(SoNTT(u)) to protect the linear parts of the Kyber 
decryption. Note that our countermeasure does not interfere 
with the MLWE structure of Kyber and has no dependency 
on n or f in Ry. Moreover, the computational overhead is 
rather low as q- (q7! mod q’) and q’ - (qt mod q), which 
are used during the combination, are constant. Thus, the 
overhead for combining for one coefficient is two multipli- 
cations and one addition modulo ĝ. The computation of the 
result and checksum requires reductions modulo q and g’, 
respectively. 


mod â (3) 


4.2 Evaluation and Analysis 


In this work, we aim for an appropriate fault protection on 
a standard microcontroller using a realistic fault model (31), 
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Offline: g(ri,..., r1) = (ti, ..., th) 


Obtain lifted polynomials z4, ..., zı (Equation 3) 


Calculate g(z1,... 


Check p; mod qd’ == t; 


Obtain final result w; = p; mod q or handle error 


Fig. 5: Fault protection of a function g(v1,..., v1) with CRT 


that is also related to attack classes presented in [87]. 
We consider an attacker who is able to precisely control 
the injection of one fault out of the three categories into a 
coefficient in Zg in a data processing or storage part of the 
device during the execution of the function g: 


e Bit-flipping fault: An attacker may flip (bit-flip fault) 
or set a certain amount of bits (stuck-at fault) of data 
in a device under attack. This may be achieved e.g. by 
using optical fault injection with a laser that shoots 
into the register file. 

e Random fault: An attacker may disturb a computa- 
tion or a memory access so that the device proceeds 
with a random or pseudo random data element. 
This may be achieved e.g. by under-powering, power 
spiking, or clock glitching during multiplication or 
readout of data from RAM. 

e Zeroization fault: An attacker may introduce a fault 
so that a zero value is processed, e.g., by enabling an 
internal power gate. 


Intuitively, an attacker has to induce a fault such that 
the result modulo q is changed (to make the attack effective) 
and where the checksum modulo q’ is the same. Thus, while 
storing or processing a coefficient x € Z, in a suitable 
representation (e.g., as integer reduced in 0 to q’ — 1), the 


attacker needs to introduce a fault e € {0, 1} [e8200] ina 
binary bit flipping model such that x mod q’ = «Ge mod q’ 
and x mod q #xz@e mod q. 

We now analyze the properties of parameters sets q, q’ 
that one would commonly encounter in lattice-based cryp- 
tosystems. For this we enumerate all representations and 
compute the minimum distance using brute-force. This 
gives the number of bits an attacker would need to flip in 
a bit-flipping attack. In addition, we estimate the success 
probability of a random value attack as (q — 1)/g. In this 
attack, the attacker has to hit one of the q — 1 values that 
represent a target r with a different v out of the ĝ possibil- 
itied'| If we assume no special care is taken to counter the 
zeroization attack and a uniform distribution of coefficients 
of p,’s and ri's, then the probability of being successful is 
(q — 1)/@ as the attack succeeds when a coefficient is hit 


1. This calculation may slightly differ in practice depending on the 
implementation and representation of elements in Zg and how the 
implementation handles a fault (e.g., randomization of a full 32-bit 
register) that may cause a value larger than q. 
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that encodes r = 0 and z # 0. In Table [I] we provide our 
results. Increasing q’ does naturally result in better protec- 
tion against a random value attack and zeroization attack. 
However, the resistance to bit flipping does not necessarily 
increase and depends highly on a particular choice of g’. It is 
also important to consider that our method only protects the 
function g and additional protection may be required for the 
combination of data and checksum as well as extraction and 
validation of the correct computation (e.g., skipping faults). 


Scheme q q' overhead in bits | min. dist 
~Kyber 3329 | 257 8 (45 %) Z 
Kyber 3329 1699 11 (48 %) 4 
Kyber 3329 | 7681 13 (52 %) 3 
NTRU Prime 4621 257 8 (42 %) 2 
NTRU Prime 4621 1699 10 (48 %) 4 
NTRU Prime | 4621 | 7681 13 (50 %) 3 
NewHope 12289 257 8 (41 %) 2 
NewHope 12289 | 1699 11 (44 %) 4 
NewHope 12289 | 7681 13 (48 %) 3 


TABLE 1: Analysis of our countermeasure for different 
parameters. The minimum distance is the minimum number 
of bitflips to circumvent the CRT checksum. 


4.3 Optimization and Control Flow Protection 


With our approach it is also possible to check that the correct 
sequence of addition and multiplication operations in R4 
was carried out. If the wrong computations are performed, 
this is reflected in a mismatch of the final checkvalue. Thus, 
our method can be used to prevent an attacker from using 
an instruction skipping fault (see (38}) to suppress, e.g., the 
addition of a noise vector e to an RLWE sample a-s + e. 
To improve efficiency, constant values can already be stored 
in the ring Rg during personalization of a device. This may 
additionally prevent an attacker from changing a pointer to 
a key to a data structure that contains only zero coefficients 
in Rq as the checksum will then not match with a very high 
probability. Also it is not necessary to store tį. Instead a 
hash function can be computed on the checksums such that 
h(pı mod q’,...,p; mod q’) = h(ti,..., tz). 


4.4 Comparison to existing Fault Countermeasures 


In this section, we aim to compare our CRT approach to 
commonly used techniques against fault attacks. Apart from 
physical countermeasures, the following possibilities exist. 


e Redundancy: Re-calculating functions to prevent ef- 
fective faults is a commonly used strategy that pre- 
vents single-fault attacks. Any effective fault will be 
detected in the double computation as the results 
of both calculations will differ. If it is possible to 
repeat the same fault twice or more, the method 
can be easily circumvented. Similar to masking, this 
countermeasure requires at least a computational 
overhead factor two and is therefore quite expensive. 

e Shuffling: In recent work, shuffling has also been 
identified to be helpful against fault attacks. When 
polynomial coefficients, for instance, are shuffled, 
an attacker does not know where the fault was 
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injected [39]. The computational cost of this coun- 
termeasure is low. On the other hand, it is not able to 
explicitly detect inserted faults. 

e Error-Correcting/Error-Detecting Codes: Error- 
Correcting or Error-Detecting Codes (ECC/EDC) are 
a general possibility to avoid double computation 
but in general offer a lower error detection rate 
than the redundancy approach. The CRT approach 
is to the best of our knowledge the first EDC 
to be analyzed in the context of lattice-based 


cryptography. 


5 COMBINATION OF RNR AND CRT COUNTER- 
MEASURES 


Many lattice-based cryptography algorithms depend on effi- 
cient polynomial arithmetic using the NTT (see Section|2.2). 
Therefore, the technique of efficiently transforming polyno- 
mials into the NTT domain, multiplying them coefficient- 
wise and transforming them back to normal domain should 
remain applicable in a new ring Rg = Raq mandated by 
the RNR or CRT countermeasure. This can be achieved by 
choosing a ring Ry that contains a suitable primitive root 
of unity wg. We can then combine the roots of unity and its 
powers as 

=] mod q) + wg : q: (q~* mod q') 
=i 


wg = wg d (a 
w =w g: (g ~ mod q) +wp q (qt mod q’) 
to obtain roots of unity in Rg. 

For the NTT as defined in Kyber (see Section 2.3.1) with 
q = 3329, we need a primitive 256-th root of unity. By 
choosing gq’ = 7681, this property is fulfilled with wy = 198 
and the Kyber NTT only needs to be adapted to the larger 
modulus which also results in an update of constants used 
in the Montgomery reduction. 

To combine both countermeasures, it is helpful to con- 
sider that Equation B]in Section [4]is just a different way of 
generating a redundant number representation for elements 
in Z, by multiplying the value r[j] not just with q but also 
with the constant q7! mod q’. The two countermeasures can 
thus be combined by using random inputs rj,...,r; that 
randomize the computation of gr, for each execution. To 
be able to check the result, it is of course also necessary to 
compute gR, (r1, r7) = (ti,...,t,). Moreover, the NTT 
constants (i.e., ¢’) have to be taken into account as well (see 
Section |3). An attacker can now either attack gr, or gR, 
according to the analysis and assumptions in Section 
However, using random constants implies that the danger 
of zeroing attacks might be prevented by explicitly avoiding 
values r[j] = 0. 


6 EVALUATION 


In this section we evaluate the countermeasures from the 
previous sections with regard to their performance and 
effectiveness. We focus on protecting arithmetic operations 
of the decryption algorithm of Kyber (see Algorithm|2). 
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8 
Implementation | | Cycle Count 
Unprotected NTT (reference) 6829 
Masked NTT (2x unprot.) 29) i 2 - 6829 = 13 658 
NTT with RNR 1S WOT. 7737 


TABLE 2: Cycle counts of our Kyber NTT protected by the 
RNR on a Cortex-M4. 


6.1 


We perform our evaluation on an ARM Cortex-M4 32-bit 
microcontroller. The ARM Cortex-M4 was chosen due to 
its popularity when evaluating PQC [5], [40], and due 
to the availability of already highly optimized code for 
comparison. 

As development environment we use the Keil Toolchain 
MDK Plus 5.29/pVision 5.29 with the ARM Compiler Ver- 
sion 5. Our measurements for the Cortex-Mé4 architecture 
are performed using a STM32F4-DISCOVERY board with 
an STM32F407 that can run with up to 168 MHz with 1 
Mbyte of flash memory and 192 kByte of RAM. For our 
measurements, we set the clock frequency to 24 MHz. The 
Cortex-M4 architecture features an instruction set extension 
- namely ARMv7E-M with instructions uaddl6, usub16, 
sasx, and ssax - that makes a very fast NTT possible 
as proposed in [40]. The target device includes the system 
timer (SysTick) which is used for measuring cycle counts. 

In Table |2} we provide measured cycle counts of side- 
channel protected (see Section |3) implementations of the 
NTT of round 3 Kyber [15]. The NTT implementation in 
is constant-time but does not contain further countermea- 
sures and is taken from the PQM4 library |5| as reference. 
This is a highly optimized NTT for the specific architec- 
ture. For reference, we provide cycle counts for a masked 
implementation in Table |2| It requires the computation of 
the NTT on two shares and thus consumes twice as many 
cycles as the reference implementation. Our implementation 
of the NTT protected by the RNR is realized in assembly 
and uses the concept of the 32-bit assembly NTT from the 
Dilithium implementation in [42]. We adapted the Dilithium 
NTT, which is originally using the modulus q = 27?—2'°+1, 
to the Kyber case (see Section for details on the 
specific NTT used by Kyber) and changed the modulus to 
qq’ = 3329 - 7681 = 25570049. 

For masking and the RNR approach, the same number of 
random bits are used. As a consequence, we do not include 
the time required for sampling these values in our cycle 
counts. As mentioned, we set q! = 7681 and use 12 bits 
of randomness for each coefficient. It is not necessary to 
randomize the NTT constants, as for our parameters no con- 
stant contains q’ as a factor and thus no de-randomization 
happens (see Section [3.3). The RNR, therefore, requires a 
total of n - 12 = 3072 random bits. For the masking, we use 
12 random bits per coefficient and thus need n - 12 = 3072 
bits of randomness as well. 


Performance of the RNR Countermeasure 


6.2 Performance of the CRT and Combined Counter- 
measure 


For the evaluation of our CRT-based fault detection mecha- 
nism (see Section 4) we use the Kyber parameter q = 3329 
and set the same q’ = 7681 that we used to evaluate the 
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Implementation | | Cycle Count 
Redundant NTT (2x unprot.) | | 2- 6829 = 13658 
NTT with CRT This work 11619 

— Combination 3882 

— NTT in Rg 7737 


TABLE 3: Cycle counts for our Kyber NTT protected against 
faults by the CRT countermeasure on a Cortex-M4. 


Implementation | | Cycle Count 
Redundant and masked NTT | | 4- 6829 = 27316 
NTT with CRT and RNR This work 18 448 

— Combination 3882 

> NTT in Ry 6829 

—> NTT in Rg 7737 


TABLE 4: Cycle counts for our Kyber NTT combining the 
RNR and CRT-based countermeasure on a Cortex-M4. 


RNR. This allows us to reuse the NTT code and Mont- 
gomery reduction. The protection level obtained by this 
parameter set is detailed in Table |1| We measured the cost 
of the combination operation from Section |4|and the NTT in 
the larger combined ring Rg. The results of the fault counter- 
measures are listed in Table 3] We compare its performance 
to the state-of-the-art method of redundant computation. All 
in all, an NTT protected by our CRT-based method results 
in a smaller computational overhead compared to the state- 
of-the-art redundant computation. In addition, we note that 
the performance of the CRT approach gets more favorable 
when a large number of operations is performed in the lifted 
domain. 

The evaluation of the combined countermeasure (see 
Section |5) is performed in a similar manner. A state-of- 
the-art implementation with protection against both, side- 
channel and fault attacks, is assumed to be realized with 
a redundant computation of a secret split into two shares. 
This leads to a larger overhead as can be seen in Table/4] The 
CRT and RNR methods provide a significant speedup to this 
state-of-the-art method. The combined method differs from 
the CRT method as the checksum can not be precomputed 
anymore. This leads to an additional computation of an NTT 
in Ry- 


6.3 Effectiveness against Side-Channel Attacks 


To evaluate the effectiveness of the RNR and our im- 
plementation, we performed a power analysis using the 
ChipWhisperer Lite platform with a STM32F303 target 
that is based on a Cortex-M4 processor core. We captured 
the power consumption of the NTT variants evaluated in 
Section|6.1|and Section|6.2|and use q = 3329 and q’ = 7681 
unless otherwise stated. Due to the usage of the ChipWhis- 
perer platform and its synchronous capture method, all 
traces are well synchronized. Thus, a lower number of traces 
is required in comparison to traditional side-channel setups. 
The main goal of our experiments is mainly to identify 
possible gaps between theory and practice. Therefore, we 
apply the non-specific t-test evaluation methodology [45]. 
Hereby, we detect possible leakages that are not part of any 
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Fig. 6: t-values of non-specific t-test for our Kyber NTT pro- 
tected by the RNR with 1000 traces and RNG off (q' = 7681) 
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Fig. 7: t-values of non-specific t-test for our Kyber NTT 
protected by the RNR with 1000 traces and parameters 
leading to derandomization due to NTT constants (here, 
q! = 257 as discussed in Section 3.2) 


specific leakage model. At each time point, we calculate the 
t-test statistic 


0— Mi 
p= WH 
so zal 
no nı 


where uo, So, and no are the sample mean, variance, and 
sample size of the power traces with fixed input and u1, s1, 
and n; those of the power traces with random input respec- 
tively. If the power traces of a constant-time implementation 
cause such a value at a specific time to exceed an absolute 
value of 4.5, we can detect a power difference between the 
usage of a fixed input or a random one and therefore the 
implementation is considered as not sufficiently secured. 

For the initial tests, we use the down-sampling feature 
of the ChipWhisperer to capture the entire NTT because at 
most 24400 samples for each execution can be captured. 
First, we verify our setup and t-test implementation by 
measuring our Kyber NTT protected by the RNR but with 
RNG off as shown in Figure 6] As expected, the test clearly 
shows leakage as data was processed without sufficient 
randomization. 

In Section [3.2] we discussed that a naive instantiation of 
the RNR countermeasure could lead to a rather insecure 
implementation. To practically verify this observation we 
measured the Kyber NTT protected by the RNR with inap- 
propriately chosen parameters (q’ = 257 as in Section 
and show the results in Figure |7| The peak at the end of 
the Figure shows the position where a multiplication by an 
NTT constant is performed, which is a multiple of q' = 257. 
As expected, the randomization is not sufficient and the 
threshold of the t-test is exceeded. 

We additionally evaluated the NIT for q’ = 7681 and 
where 12 random bits are used to randomize every input 
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coefficient. The result of this test for 10 000 traces is shown in 
Figure[3| It does not show any leakage beyond the threshold. 


[0] 2000 4000 6000 8000 10000 12000 14000 16000 


Fig. 8: t-values of non-specific t-test for our Kyber NTT 
protected by the RNR with 10000 traces (q' = 7681, 12 bits 
of randomness for every coefficient) 


To verify our recommended implementation in more 
detail, we disable the downsampling feature of the Chip- 
whisperer. In Figure |9} we show the result of the t-test 
on the power traces of the first, second, and last round of 
the Cooley-Tukey NTT implementation (see Appendix 
where each round is equal to one iteration in the outer for- 
loop) using the combination of the RNR with the CRT as 
introduced in Section|5| In none of these traces leakage be- 
yond the threshold is captured. This verifies the assumption 
that the combination of the RNR and the CRT can be used 
to randomize computations in lattice-based cryptography 
if parameters are chosen correctly (see Section [3.2). Note 
that we increased the range of random checkvalues in com- 
parison to the RNR evaluation from [0, 4096) to [0, 7681) 
and the number of captured traces is increased to 100000. 
Additionally for this practical evaluation, q’ is chosen larger 
than in the theoretical analysis of Section Even with 
100 000 traces and the low noise level for the ChipWhisperer 
Lite, we cannot identify any leakage. 


6.4 Kyber Decryption 


For combined protection, the overhead factor of four on the 
Cortex-M4, obtained from masking and redundant compu- 
tations is reduced to around 2.7 for one execution of the 
NTT (see Table (4). We also applied the combined counter- 
measures to the arithmetic part 


v — INTT(SoNTT(u)) 


of the KYBER768.CPA decryption in Algorithm [2] The cycle 
counts in Table |5| therefore only include the arithmetic 
operations of the decryption in the implementation [5] on 
the Cortex-M4. However, we manage to reduce the com- 
putational overhead factor for leakage and fault-protected 
implementation from 2.9 to around 2.2 on the Cortex- 
M4. If more shared operations are needed, the RNR and 
CRT method will gain additional performance compared to 
redundant computations of the shares. 


7 CONCLUSION AND FUTURE WORK 


In this work, we have analyzed the redundant number 
representation (RNR) and proposed the application of the 
Chinese remainder theorem (CRT) techniques to protect 
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10 
Implementation | Protection | Source | Cyc. Count 
Unprotected none ; 79509 
Masking & Redundancy | DPA & Fault SL, IET 229 922 
RNR & CRT DPA & Fault | This work 174 858 


TABLE 5: Cycle counts for v — INTT(SoNTT(u)) in Ky- 
BER768.CPA.DEC decryption on Cortex-M4 


arithmetic operations in lattice-based cryptography. A com- 
bination of both methods leads to a speed-up factor of 
roughly 1.3 compared to the straightforward approach of 
masking and redundant calculation for the protection of lin- 
ear parts of Kyber decryption on the ARM Cortex-M4. Both 
the RNR and CRT offer a security-time trade-off that can be 
adjusted according to the specific use-case by changing the 
parameter q’. Moreover, due to the low performance over- 
head of only 13 percent of an RNR protected NTT, the RNR 
approach could be used as an additional countermeasure in 
a masked implementation to possibly achieve higher order 
protection or some resistance against single-trace attacks. A 
combination of the methods with masking countermeasures 
would be an interesting topic for future work. Additional 
future work may consist of the optimization of the NTT for 
32-bit coefficients on other architectures than the Cortex-M4. 
It may also be interesting to evaluate the impact of using the 
RNR to protect against single trace attacks as it increases 
the amount of possible intermediate values. Moreover, a 
practical evaluation of the CRT countermeasure using laser 
fault injection could further substantiate our theoretical 
model and analysis but is considered out of the scope of this 
work. The provided strength against multiple faults is also 
still open in theory and in practice. And even though ring 
extension methods seem more suitable for microcontroller 
implementations due to the fixed size of the ALU, it might 
still be interesting to implement them in hardware. As 
already shown in [14], it is for example possible on FPGAs 
to chose parameters that fit into the width of DPS or RAM 
hard macros. Another interesting avenue for future work 
might be the search for optimal parameters for the CRT and 
an improvement of the brute-force method to compute the 
minimum distance for larger moduli often used in lattice- 
based digital signatures. 
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